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[57] ABSTRACT 

A parallel computer including a plurality of processing 
elements, each of processing elements comprising a flag 
address holding unit for temporarily holding an address of a 
send complete flag of a direct remote write message when 
the direct remote write message is sent to another processing 
element, and a flag update unit for exclusively updating a 
flag represented by the address held in the flag address 
holding unit when data indicated by the direct remote write 
message has been sent. 
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PARALLEL COMPUTER WHICH VERIFIES address of a send complete flag included in the message. 

DIRECT DATA TRANSMISSION BETWEEN When data indicated by the direct remote write message has 

LOCAL MEMORIES WITH A SEND been sent, the flag update unit of PE1 exclusively updates a 

COMPLETE FLAG indited by the address of the send complete flag held 

^ " " " 5 in the flag address holding unit to represent that the data has 

This is a continuation of Ser. No. 08/408,306 filed on be ™ . . 

Mar. 22, 1995, now abandoned. When PE1 receives a direct remote write message from 

PE2, the flag address holding unit of PE1 temporarily holds 

BACKGROUND OF THE INVENTION ^ e a ddress of a receive complete flag contained in the 

1 V li " " " 1 ~ " 10 message. When data indicated by the direct remote write 

1. Field of the Invention message has been received, the flag update unit of PE1 
Hie present invention relates to a parallel computer exclusively updates a flag indicated by the address of the 

comprising a plurality of processing elements, in particular, receive fl *g he K ld ™ the fla 8 add , ress holdin S uml 10 

to a parallel computer for reducing overhead of software so ^ ^ 

as to improve efficiency of data processing. 15 ™hen PE1 had sent a direct remote read message to PE2 

r . . £ j a and PE1 has completed receiving reply data in response to 

2. Description of the Related Art ^ message from PE2 , p£l exclusively updates a flag 
When a parallel computer that performs communication indicated by the address of an acquisition complete flag 

by message passing executes a global operation, it treats data included in the direct remote read message to represent that 

used in the global operation as messages. In other words, the data acquisition has been completed, 

each processing element of the parallel computer sends data 20 When PE1 receives a direct remote read message from 

necessary for a global operation to a buffer of a designated PE2, the flag address holding unit of PE1 temporarily holds 

receiver processing element in the same manner as a con- the address of a reply complete flag of the message. When 

ventional message. Then the receiver processing element p£l has completed sending data indicated by the direct 

searches a buffer by software and copies the data to a user remote read message, the flag update unit of PE1 exclusively 

memory region to receive the message data. In this case, a 25 updates a flag indicated by the address of the reply complete 

conventional memory is used for the buffer. Since it is fl ag held in the flag address holding unit to represent that the 

necessary to search the buffer and copy a message from the <j a ta has been replied. 

buffer, overhead of software is large. i n a seC ond aspect of the parallel computer according to 

To solve such a problem, a method for using active the present invention, each processing element comprises a 

messages such as PUT/GET is known. dedicated communication register. The dedicated commu- 

When a processing element uses a PUT message, data can nication register comprises a plurality of registers and a 

be directly transferred from a user region of the processing plurality of flags corresponding thereto. Each of the registers 

element(sender) to a user region of another processing stores data indicated by a direct remote write message 

element(receiver) without using a buffer. When a processing 35 received from another processing element. Each of the flags 

element uses a GET message, data can be directly trans- represents data storage state of the corresponding register, 

ferred from the user region of another processing element Each processing element references data stored in the dedi- 

(sender) to the user region of the processing unit(receiver) cated communication register so as to reference data of the 

without using the buffer. omer process element and performs data processing. 

Thus, the overhead involved in the receiving process of 40 These and other objects, features and advantages of the 

message passing can be deleted. In addition, the communi- present invention will become more apparent in light of the 

cation and calculation can be overlapped. However, in the following detailed description of a best mode embodiment 

case of PUT/GET, unlike with the message passing, there is thereof, as illustrated in the accompanying drawings, 

no explicit receive command, therefore means for detecting BRIEF DESCRIPTION OF DRAWINGS 

the reception of a message is required. 45 FIG x ^ a schematic diagram showing a theoretical 

Thus, to implement the PUT/GET in a conventional construction of the present invention; 

parallel computer, when the computer receives a message, it pjQ. 2 is a schematic diagram showing a theoretical 

activates a software handler using an interrupt so as to construction of the present invention; 

exclusively update a flag that represents the reception of the FJGS ft are a ^ for laini basic 

message in a system mode or the like to detect the reception 50 £ ormats 0 f put/get 

of the message. p IG ^ . g ^ diagram showing an embodiment of the 

However, in the above described configuration, overhead p resenl invention; 

of the software is large. FIG 5 fa a ^ chart showiag a proccss in issuing piJT; 

SUMMARY OF THE INVENTION 55 FIG. 6 is a time chart showing a process in issuing PUT; 

FIG. 7 is a time chart showing a process in issuing GET; 

An object of the present invention is to provide a parallel FIG g ^ a sch e ma tic diagram for explaining a flag update 

computer that uses messages such as PUT/GET for direct pT0CCSS mal is performed when a message is received; 

data transmission from a local memory to another local nG 9 {& & Wock ^ m showing an embodiment of the 

memory with reduced software overhead. 60 presem inveDtion . ^ d 

The parallel computer according to the present invention mQ 1Q fc a scfaem atic diagram showing an execution 

comprises a plurality of processing elements. Each process- examp i e 0 f a global operation, 
ing element comprises a flag address holding unit and a flag 

update unit. DESCRIPTION OF PREFERRED 

When a processing element 1 (PE1) sends a direct remote 65 EMBODIMENTS 

write message to another processing element 2 (PE2), the FIGS. 1 and 2 show theoretical constructions of the 

flag address holding unit of PE1 temporarily holds the present invention. In these drawings, a plurality of process- 
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ing elements 1 construct a parallel computer. An inter- element l(receiver), the flag address holding unit 14 tem- 

connecting network 2 connects each processing element 1. porarily holds the address of a reply complete flag contained 

Each processing element 1 shown in FIG. 1 comprises a in the header of the direct remote read message. When the 

processor 10, a memory 11, and a communication control sender has sent the data indicated by the direct remote read 

un it 12, S message (GET), the flag update unit 15 exclusively updates 

The communication control unit 12 comprises a memory a fla S indicated by the address of the reply complete flag 

control unit 13, a flag address holding unit 14, and a flag held * fla 8 address holdin g unil 14 so as to set a flag that 

update unit 15. The memory control unit 13 controls the represents that the sender has replied the data indicated by 

memory 11. The flag address holding unit 14 temporarily to direcl remote read message. 

holds the address of a flag included in the header of direct 10 According to the present invention shown in FIG. 1, when 

remote write/read messages that are sent/received to and messages such as PUT/GET that directly access a memory 

from another processing element 1. The flag update unit 15 are used, a flag that is used to protect send/receive regions 

exclusively updates a flag indicated by the flag address held for sending and receiving messages is updated by hardware, 

in the flag address holding unit 14. Tons, unlike with the conventional system, the operation of 

It is a feature of the present invention shown, in FIG. 1 15 the Processor 10 is not affected by an interrupt, 

that the communication control unit 12, which is constructed Consequently, by combining with a system that sends a 

of hardware, comprises the flag address holding unit 14 and rec l uest of PUT/GET without blocking the processor 10, 

the flag update unit 15. communication and calculation of data can be completely 

„„ * ' , . <• ♦i/ j \ overlapped, thereby remarkably improving the executing 

When the processor 10 of a processing element l(sender) „ „ . rr ' iL ' , \ 

. j. \ u r u r>Trr* ,l 20 efficiency of the parallel computer, 

sends a direct remote write message such as PU I to another * r r 

processing element l(receiver) (the direct remote write In FIG. 2, the processing element 1 comprises a processor 

message causes data to be written into a memory or a 10, a memory 11, a memory control unit 13, and a dedicated 

register of the receiver), the flag address holding unit 14 communication register 16. 

temporarily holds the address of a send complete flag Tn e dedicated communication register 16 comprises a 

contained in the header of the direct remote write message. plurality of registers and a plurality of flags corresponding 

When data indicated by the direct remote write message has thereto. Each register stores data indicated by a direct remote 

been sent, the flag update unit 15 exclusively updates a flag write message received from another processing element 1. 

indicated by the address of the send complete flag held in the Each flag manages a bit value representing data storage state 

flag address holding unit 14 so as to set a flag that represents 3Q of the corresponding register. 

that data indicated by the direct remote write message has It is a feature of the present invention shown in FIG. 2 that 

been sent. the processing element 1 comprises the dedicated commu- 

At this point, the flag address holding unit 14 also holds nication register 16, which comprises the registers (which 

the address of an acknowledgement flag. When the process- store data indicated by the direct remote write message such 

ing element l(sender) sends the direct remote write message 35 as PUT received from another processing element 1) and the 

to another processing element l(receiver), the flag update flags (which manage bit values representing data storage 

unit 15 updates the acknowledgement flag. In addition, when states of the corresponding registers), 

the sender receives a receive complete message from the In such a construction, the processing element 1 performs 

receiver in response to the direct remote write message, the operations for both data stored in the dedicated communi- 

flag update unit 15 updates the acknowledgement flag in 40 cation register 16 and local data and sends the calculated 

inverse mode so as to set a flag that represents that the resultant data to the dedicated communication register of 

receiver has received the data indicated by the direct remote another processing element 1 using the direct remote write 

write message. message so as to execute a global operation. 

When a processing element l(receiver) receives a direct In addition, a processor element 1 can send broadcast data 

remote write message such as PUT from another processing 45 to the dedicated communication register 16 of another 

element l(sender), the flag address holding unit 14 of the processing element 1 using the direct remote write message, 

receiver temporarily holds the address of a receive complete to execute a broadcast process. 

flag contained in the header of the direct remote write Moreover, a processing element 1 can perform an opera- 
message. When the receiver has received data indicated by tion for both data stored in the dedicated communication 
the direct remote write message, the flag update unit 15 50 register 16 and local data and send the calculated resultant 
exclusively updates a flag indicated by the address of the data to the dedicated communication register 16 of another 
receive complete flag held the flag address holding unit 14 processing element 1 using the direct remote write message 
so as to set a flag that represents that the receiver has to perform a barrier synchronizing process, 
received the data indicated by the direct remote write In addition, a processing element 1 can perform a prede- 
message. 55 termined operation for data stored in the dedicated commu- 
In addition, when a processing element 1 has received nication register 16 and send the calculated resultant data to 
reply data in response to a direct remote read message such the dedicated communication register 16 of another process- 
as GET (the direct remote read message causes data to be ing element 1 using the direct remote write message to 
read directly from a memory or a register of another pro- perform a recognizing process of the status of barrier syn- 
cessing element), the flag update unit 15 exclusively updates 60 chronization. 

a flag indicated by the address of the acquisition complete Thus, according to the present invention shown in FIG. 2, 

flag contained in the header of the direct remote read when a message such as PUT that directly accesses the 

message so as to set a flag that represents the processing memory is used, the dedicated communication register 16 is 

element 1 has acquired the data indicated by the direct used as the destination of data indicated by the message, 

remote read message. 65 thereby remarkably reducing the overhead for accessing the 

When a processing element l(sender) receives a direct memory. In addition, since the data receive state in the 

remote read message such as GET from another processing dedicated communication register 16 is represented with a 
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flag, the access of the processor 10 to the dedicated com- request queued in the transmission request command queue 

munication register 16 can be controlled by hardware using 20 and issues a data transmission request. A transmission 

the flag. Thus, the overhead of the software can be deleted. control unit 22 executes a data transmission process corre- 

Consequently, the global operation process, the broadcast sponding to a data transmission request issued by the 

process, the barrier synchronizing process can be executed 5 command process unit 21. 

at higher speed. A memory control unit 23 has a DMA (Direct Memory 

Next, preferred embodiments of the present invention will Access) function so as to read and write data from and to the 

be described in detail. memory 11. A receive control unit 24 receives data from 

In the parallel computer according to the present another processing element 1. A flag address holding unit 25 

invention, messages (such as PUT and GET) that causes data 10 ^at temporarily holds the addresses of flags that PUT and 

to be read and written from and to the memory or register of GET have. A flag update unit 26 that exclusively updates a 

another processing element 1 arc used. flag indicated by the flag address held in the flag address 

FIGS. 3(a) and 3(b) show basic formats of PUT and GET holding unit 25. 

used in the present invention, respectively. A reply_command-queue 27 N queues GET received from 

As shown in FIG. 3(a), PUT used in the present invention 15 anomer^processing-element-lrAcommand process unit 28 

has arguments of dest__cid (the ID of a processing element interprets GET queued in the reply_coj^nd-queur2Tand 

1 (receiver)to which data will be sent), local_addr (the issues a data transmission request. A transmission control 

address of the local memory of the processing element unit 29 that executes a data transmission process corre- 

l(sender) where send data is stored), size (the size of data to sponding to a data transmission request issued by the 

be sent), remote_addr (the address of the local memory of command process unit 28. 

the receiver to which data will be sent), send_complete_ Next, with reference to flow charts of FIGS. 5 to 7, the 

flag (the address of the send complete flag that informs the operation of the communication control unit 12 will be 

software of the sender that data has been sent), put_flag (the described. 

address of the put flag that informs the software of the When the command process unit 21 of the processing 

receiver that data has been received), and ack (the value element 1 sends data designated by PUT in response to a 

which direct that ack flag, which informs the software of the read transmission request that is read from the transmission 

sender that the data has been received, is used or not). request command queue 20, the command process unit 21 of 

When a processing element 1 (sender) sends data to the processing element 1 direct the memory control unit 23 

another processing element 1 (receiver) using PUT, the 3Q to read data of the seize designated by PUT starting from the 

processor 10 sets the ID of the receiver to dest_cid. The address local_addr designated by PUT Thereafter, the 

address of the local memory of the data to be sent is set to command process unit 21 direct the transmission control 

local_addr. The size of data to be sent is set to size. The unit 22 to transmit the read data to the receiver processing 

address of the local memory of the receiver is set to element 1 along with PUT. 

remote_addr. The address of the send complete flag is set to 35 When this process is started, the command process unit 21 

send_complete_flag. The address of the put flag is set to informs the memory control unit 23 of the address of the 

puL_flag. When the act flag is enabled, "1" is set to ack. send complete flag and the ack value designated by PUT and 

When the act flag is disabled, "0" is set to ack. direct it to perform a flag update process for the send 

On the other hand, GET used in the present invention has complete flag and ack flag. When the memory control unit 

arguments of dest_cid (the ID of a processing element 1 40 23 receives the update request, it stores the address of the 

(sender) from which data is received), local_addr (the send complete flag to the flag address holding unit 25. When 

address of the local memory of the processing element 1 the ack flag is enabled, the memory control unit 23 informs 

(receiver) to which required data will be stored), size (the the flag update unit that the ack flag is enabled, 

size of required data), remote_addr (the address of the local When the flag update unit 26 is informed that the ack flag 

memory of the sender in which required data is stored), 45 is enabled, it exclusively obtains the value pointed to by the 

get_flag (the address of get flag that informs the software of address of the ack flag held in the flag address holding unit 

the receiver that the data has been acquired), and send_ 25 and increments the value of the flag by "1". Since the ack 

complete_flag (the address of send complete flag that flag is provided as a common flag in each processing 

informs the software of the sender that data has been element, the address of the ack flag is p re-held in the flag 

replied). 50 address holding unit 25. 

In other words, when a processing element 1 (receiver) Thereafter, the memory control unit 23 reads the required 
acquires data from another processing element 1 (sender) data from the memory 11 and sends it to the command 
using GET, the processor 10 sets the ID of the sender from process unit 21. Thus, the data is sent to a receiver process- 
which data is received to dest_cid. The address of the local ing element 1 designated by PUT. When the receiver pro- 
memory to which required data will be stored is set to 55 cessing element 1 has read all data to be sent, the memory 
local_addr. The size of the required data is set to size. The control unit 23 inform the flag update unit 26 of the 
address of the local memory to which the required data is completion of the data read process, 
stored is set to remote_addr. The address of the get flag is When the flag update unit 26 is informed of the comple- 
set to get_flag. The address of the send compete flag is set ti 0 n of data transmission, the flag update unit 26 exclusively 
to send_complete_flag. 60 obtains the flag value pointed to by the address of the send 

FIG. 4 shows an embodiment of the communication complete flag held in the flag address holding unit 25, that 

control unit 12 shown in FIG. 1. Next, referring to FIG. 4, is the flag value of the send complete flag, and increments 

the construction of the communication control unit 12 will the flag value by "1". When the flag update unit 26 receives 

be described in detail. a receive complete message from the receiver processing 

In FIG. 4, a transmission request command queue 20 65 element 1 in response to the data transmission, if the ack 

queues PUT and GET requested by the processor 10. A value represents that the ack flag is enabled, the flag update 

command process unit 21 that interprets a transmission unit 26 exclusively obtains the flag value pointed to by the 
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address of the ack flag held in the flag address holding unit from the sender processing element 1, the receive control 

25, that is the flag value of the ack flag, and decrements the unit 24 of the receiver processing element directs the 

flag value by "1". memory control unit 23 to write required data of the size 

As shown in FIG. 5, when the sender processing element designated by GET to a region of the memory 11 starting 

1 has sent data designated by PUT, the flag value of the send s from the address local_addr designated by GET 

complete flag is changed to " 1". When a function amcheck y/h&n the memory control unit 23 has completed the write 

detects that the flag value becomes "1", the software that pr0 cess for the required data, it sends the address of the get 

issued the PUT is notified of the completion of the data flag designated by GEX to ^ flag update ^ 2 6 and 

transmission. As shown in FIG 6 when the sender process- mforms jt of ^ com letion of the required data reC eption. 

ing element 1 starts sending da a designated by PUT, the flag 10 when ^ fl date ^ 26 fa of {h& leti 

value of the ack flag is set to 1 . When die sender . ^^J^ ^ flag valllc pointed to by the address 

processing element 1 receives a receives complete message > * £ ' 

which represents that the receiver processing element 1 has ^ 7*> y _ 1 7 u*7, 

received the data, the flag value of ack flag is set to "0". increments the flag vale by <1 . 

When a function ack_check( ) detects that the flag value c As shown io FIG. 7, when the receiver processing element 

of the ack flag is "0", the software that issued the PUT is 15 1 has received the rec l uired data designated by GET the flag 

notified of the completion of the data reception by the value of the get flag is changed to "1". When the function 

processing element 1. amcheck detects that the flag value is changed to "1", the 

Thus, according to the present invention, the sending software that has sent GET is notified of the completion of 

region of the memory can be protected without using an the required data reception. 

interrupt. In the embodiment shown in FIG. 5, when the 20 Thus, according to the present invention, the software can 

function amcheck detect that the flag value of the send be informed of the receive state of the requested data without 

complete flag is changed to "1", the software is notified of using an interrupt. In the embodiment shown in FIG. 7, 

the completion of the data transmission. However, when the when the function amcheck detects that the flag value of the 

flag value of the send complete flag becomes a value other get flag is changed to "1", the software is notified of the 

than "1", the software may be notified of the completion of 25 completion of the required data reception, 

the data transmission. However, when the flag value of the get flag is changed 

When a processing element 1 receives data designated by to a value other than " 1", the software may be notified of the 

PUT, the receive control unit 24 thereof directs the memory completion of the data reception. 

control unit 23 to write received data of the seize designated When the receive control unit 24 of the processing 

by PUT to a region of the memory 11 starting from the 30 element-l-receiyes GET, it queues the GET in c thejer!ly} 

address remote-addr designated by PUT. ^pmmand^queue^27rWhen the command process unit 28 

When this process is started, the receive control unit 24 sends data required by the queued GET, it directs the 

informs the memory control unit 23 of the address of the put memory control unit 23 through the transmission control 

flag designated by PUT and directs it to update the put flag. ^ unit 29 to read data of the size designated by GET starting 

When the memory control unit 23 receives the update from the address remote_addr designated by GET from the 

request, it makes the flag address holding unit 25 hold the memory 11 and directs the transmission control unit 29 to 

address of the put flag. send the information with respect to the GET and the read 

Thereafter, the memory control unit 23 writes the received data to the processing element 1 that sent GET. 

data to the memory 11. When the memory control unit 23 4Q When this process is started, the command process unit 28 

has completed the data write process, it informs the flag sends the address of the send complete flag designated by 

update unit 26 of the completion of the data reception When GET to the memory control unit 23 through the transmission 

the flag update unit 26 is informed of the completion, it control unit 29 and directs it to update the send complete 

exclusively obtains the flag value pointed to by the address flag. The memory control unit 23 stores the address of the 

of the put flag held in the flag address holding unit 25, that 45 send complete flag in the flag address holding unit 25 in 

is the flag value of the put flag, and increments the flag value response to this request. 

by "1". Thereafter, the memory control unit 23 reads the required 

When the receiver processing element 1 has received data data from the memory 11 and sends it to the transmission 

designated by PUT, the flag value of the put flag is changed control unit 29. Thus, the required data is sent to the 

to "1". The software of the receiver processing element 1 is 50 processing element 1 that sent GET. After the memory 

notified of the completion of the data reception when a control unit 23 has completed the read process of the 

function amcheck detects that the flag value is "1". requested data, it informs the flag update unit 26 of the 

Thus, according to the present invention, the receiving completion of the required data transmission, 

region of the memory can be protected without using an When the flag update unit 26 is informed of the 

interrupt. In the embodiment shown in FIG. 5, when the 55 completion, it exclusively obtains the flag value stored at by 

function amcheck detects that the flag value of the put flag the address of the send complete flag held in the flag address 

is changed to "1", the software is notified of the completion holding unit 25, that is the flag value of the send complete 

of the data reception. However, when the flag value of the flag, and increments the flag value by "1". 

put flag is changed to a value other than "11", the software As shown in FIG. 7, when the sender processing element 

may be notified of the completion of the data reception. 60 1 has sent required data designated by GET, the flag value 

When a receiver processing element 1 sends GET corre- of the send complete flag is changed to "1". When the 

sponding to a transmission request read from the transmis- function amcheck detects that the flag value is changed to 

sion request command queue 20, the command process unit "1", the software of the processing element 1 that received 

21 directs the transmission control unit 22 to send GET to a GET is notified of the completion of the required data 

sender processing element 1 from which data is received. $s transmission. 

When the receiver processing element 1 receives infor- Thus, according to the present invention, the sending 

mation with respect to the issued GET and the required data region of the memory can be protected without using an 



05/03/2004, EAST Version: 1.4.1 



6,115,803 

9 10 

interrupt. In the embodiment shown in FIG. 7, when the reference control unit 35 that executes a reference process of 
function amcheck detects that the flag value of the send data stored in the dedicated communication register 16. 
complete flag is changed to "1", the software is notified of In the processing element 1 according to the present 
the completion of the required data transmission. However, invention configured as described above, when the processor 
when the flag value is changed to a value other than "1", the 5 10 writes data to a predetermined address of a shared 
software may be notified of the completion of the required memory space, the memory control unit 13 generates pack- 
data transmission. cts ^ queues them in the transmission request command 
0 . a . . . j.„u rt - queue 30. The command process unit 31 successively sends 
FIG. 8 shows a flag update process that is performed when 1 , . 4 , *\, . . i 

. . * v v * the queued packets to another processing element 1 using 

a message is received. ^\y\ 

A header analyzing unit 80 in the receive control unit 24 10 ^ processor 10 can md data to a desired pro . 

analyzes a message received from the inter-connecting net- cessmg element 1 with only a store command for a prede- 

work 2 and extracts an address of a flag (flag_addr) Hla, the termined address. The data will be stored in the dedicated 

starting address of data to be sent (data_addr) Sib, and a communication register 16 of a processing element 1 that is 

size (size) 81c of the data to be sent. The extracted flag_addr mapped to the predetermined address. 

81a is output to the flag address holding unit 25 to be held 15 w^en the receive control unit 34 receives data sent from 

in the flag address holding unit 25. The data_addr 816 and another processing element 1 with PUT or in response to 

the size 81c are output to the memory control unit 23. GET, the memory control unit 13 writes the data to a 

When a DMA setting unit 82 of the memory control unit designated register of the dedicated communication register 

23 receives the data_addr 816 and size 81c from the receive 16. When the data has been written to the designated 

control unit 24, it sets the DMA (Direct Memory Access) 20 register, a flag corresponding to the register is set to "1" that 

corresponding to these data and directs the DMA process represents that data is stored. 

unit 83 to activate DMA. The DMAprocessunit 83 performs The processor 10 acquires data necessary for a particular 

the DMA process between the memory 11 and the transmis- operation from another processing element 1 by issuing a 

sion control unit 29 based on a command from the DMA load command with a register number of the dedicated 

setting unit 82. When a DMA complete detecting unit 84 25 communication register 16 in which the data is stored. The 

detects the completion of the DMA process, it directs the load command is sent to the register reference control unit 

flag address load unit 85 to load the address of the flag. The 35. When the bit value of the flag corresponding to the 

flag address load unit 85 reads the address of the flag from register number is "1", the register reference control unit 35 

the flag address holding unit 25 in response a command from ^ reads data from the register of the dedicated communication 

the DMA complete detecting unit 84. A flag data load unit register corresponding to the register number and sends the 

86 reads the value of the flag stored in the memory 11 data to the processor 10. When the bit value is "0", the 

corresponding to the address of the flag. A flag update register reference control unit 35 waits until the bit value 

requesting unit 87 sends the read data to the flag update unit becomes "1". When the bit value becomes "1", the register 

26 and directs the flag update unit 26 to update the flag. The ^ reference control unit 35 reads data and sends it to the 

flag data updated by the flag update unit 26 is sent back to processor 10. After the register reference control unit has 

the memory control unit 23. A flag data storing unit 88 read the data, the bit value is reset to "0". 

writes, the flag data at the same flag address of the memory tne processor 10 can receive desired data with only 

11. one load command. 

As described above, when the data transmission indicated 4Q According to this embodiment, the dedicated communi- 

by the received message has been completed, a flag that cation register 16 which is nearer to the processor 10 than 

indicates the completion of the data transmission is updated. me mory 11 is provided. The dedicated communication 

In the above-described embodiment, when the address register 16 is used as a destination of transmission by PUT 

value of the flag included in the message is set to zero, the to reduce the overhead for the memory access. In addition, 

flag update process is not performed. 45 since a flag that represents data storage stage of the register 

In the embodiment shown in FIG. 4, to raise the speed of 16 is provided, the access of the register from the processor 

the reply process, the reply c6^mand^queue-is_provided 10 can be controlled by hardware. When the flag of the 

independently. However, a common queue-can-be-usedJqr) dedicated communication register 16 represents that neces- 

the reply-command queue 27 and the transmission request sary data is not stored yet, it is preferable to process another 

command queue 20. In this case, the command process unit 50 task in ready state. 

28 and the transmission control unit 29 can be omitted and with the above configuration, the processor 10 can 

J thereby the construction can be simplified. execute the global operation process, the broadcast process, 

FIG. 9 shows an embodiment of the processing element 1 and the barrier synchronizing process at high speed, 

of FIG. 2. Next, the construction of this embodiment will be [ n other words, since the processor 10 performs an 

described in detail. In FIG. 9, the same units as those in FIG. 55 operation process for both data stored in the dedicated 

2 are denoted by the same reference numerals. communication register 16 and local data and sends the 

In FIG. 9, a transmission request command queue 30 calculated resultant data to the dedicated communication 

queues PUT and GET requested by the processor 10. A register 16 of another processing element 1 using PUT, the 

command process unit 31 interprets a transmission request global operation process can be executed at high speed. In 

queued in the transmission request command queue 30 and 60 this case, for example, to inform another processing element 

issues a data transmission request. A transmission control 1 of the processing element 1 with the maximum value of the 

unit 32 that executes a data transmission process corre- calculated resultant data an ID number of the processing 

sponding to a data transmission request issued by the element 1 may be sent to the dedicated communication 

command process unit 31. register 16. In other words, in addition to the calculated 

A receive buffer 33 that temporarily stores data sent from 65 resultant data, the ID number of the processing element 1 

another processing element 1. A receive control unit 34 that with particular calculated resultant data can be sent to the 

receives data from another processing element 1. A register dedicated communication register 16. 
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In addition, when the processor 10 sends broadcast data to register 16 of the processing element 1 of the cell B that is 

the dedicated communication register 16 of another process- away from the cell D by two cells according to the crossover 

ing element 1, the broadcast process can be executed at high method. 

speed. At step 3, the processing element 1 of the cell A calculates 

The processor 10 also performs an operation for both data $ *e sum of two stored data (a+d) and (of b) of the dedicated 

stored in the dedicated communication register 16 and local communication register 16. The processing element 1 of the 

data and sends the calculated resultant data to the dedicated cc » B ** SUm ° f W ° St0rc ? d ^ a ( £ a) and (d+c) 

. • t +c c *u • i 4 1 of the dedicated communication register 16. The processing 

communicaUon register 16 of another processing element 1 dement ± Qf {hc ^ Q ^ sum of {wq ^ ^ 

using PUT to execute the barrier synchronizing process at (c+b) md (a+d) ^ ocessin ekment x of the ce]1 D 

high speed. For example, when the processing element 1 io ^ ates the ^ of ^ QTcd data (d+c) ^ ^ of the 

comes to barrier synchromzmg point, "1" is output to dedicated communication register 16. 

another processing element 1 When the sum of the output ^ ^ ^ obal tion wat calculates the sum of local 

values reaches the number of processing elements 1, it is daU {s execute(L ^ such a global operation, when the 

determined that the barrier synchronization has been estab- number of processing elements 1 is 2", each of load, store 

lished. Thus, the barrier synchronizing process can be « ^ operation is exccuted n times. In addition, data are sent 

executed at high speed. according to the crossover method in such a manner that the 

In addition, the processor 10 performs a predetermined first data is sent to the adjacent processing element 1, the 
operation for data stored in the dedicated communication next data is sent to a processing element 1 that is away from 
register 16 and sends the calculated resultant data to the the processing element 1 by two processing elements, the 
dedicated communication register 16 of another processing 20 third data is sent to a processing element 1 that is away from 
element 1 using PUT to execute the recognizing process of the processing element 1 by four processing elements, and 
the status of the barrier synchronization at high speed. For so forth. Thus, when the number of processing elements 1 is 
example, the processing element 1 sends a state value at the 2 M , the dedicated communication register 16 of each pro- 
barrier synchronizing point to another processing element 1 cessing element 1 should have n registers, 
or performs AND operation and OR operation for the status 25 described above, according to the present invention, 
value to detect the status value at the barrier synchronizing wn en a message such as PUT/GET that directly accesses a 
point. Thus, the recognizing process of the status of the memory and does not have an explicit receive command is 
barrier synchronization can be executed at high speed. used, since a flag used to protect send/receive regions 

Next, with reference to FIG. 10, an example of the global 3Q necessary for sending and receiving a message is updated by 

operation executed by an embodiment of the present inven- hardware, the operation of the processor is not affected by an 

tion will be described. In this example, a global operation interrupt. When requests of PUT/GET are sent without 

that calculates the sum of local data of four processing blocking the procession, communication and calculation can 

elements is considered. be completely overlapped, thereby remarkably improving 

To calculate the sum of the local data, at step 1, the 35 executing efficiency of the parallel computer, 

processing element 1 of a cell A sends local data a to the When a message such as PUT that directly accesses the 

dedicated communication register 16 of the processing ele- memory is used, since a dedicated communication register 

ment 1 of a cell B that is adjacent to the cell A. The to which data indicated by the message is sent is used, the 

processing element 1 of the cell B sends local data b to the overhead for accessing the memory can be remarkably 

dedicated communication register 16 of the processing ele- 40 reduced. In addition, since data storage state of the dedicated 

ment 1 of a cell C that is adjacent to the cell B. The communication register is represented with a flag, the access 

processing element 1 of the cell C sends local data c to the of the processor to the dedicated communication register can 

dedicated communication register 16 of the processing ele- be controlled by hardware. Since the overhead of the soft- 

ment 1 of a cell D that is adjacent to the cell C. The ware is reduced, a processing element can execute the global 

processing element 1 of the cell D sends local data d to the 45 operation process, the broadcast process, and the barrier 

processing element 1 of the cell A that is adjacent to the cell synchronizing process at higher speed. 

D. Although the present invention has been shown and 

At step 2, the processing element 1 of the cell A calculates described with respect to a best mode embodiment thereof, 

the sum of the local data a and the data d stored in the it should be understood by those skilled in the art that the 

dedicated communication register 16 and sends the added 50 foregoing and various other changes, omissions, and addi- 

value to the dedicated communication register 16 of the tions m me form and detail thereof may be made therein 

processing element 1 of the cell C that is away from the cell without departing from the spirit and scope of the present 

A by two cells according to crossover method. The process- invention, 

ing element 1 of the cell B calculates the sum of the local is claimed is: 

data b and data a stored in the dedicated communication 55 1. A parallel computer including a plurality of processing 
register 16 and sends the sum to the dedicated communica- elements, each of the processing elements comprising: 
tion register 16 of the processing element 1 of the cell D that processor means for executing instructions and process- 
is away from the cell B by two cells according to the ing data; and 

crossover method. The processing element 1 of the cell C communication control means constructed of hardware, 

calculates the sum of the local data c and data b stored in the 60 comprising: 

dedicated communication register 16 and sends the sum to flag address holding means for temporarily holding an 

the dedicated communication register 16 of the processing address of a send complete flag of a direct remote 

element 1 of the cell A that is away from the cell C by two write message when the direct remote write message 

cells according to the crossover method. The processing is sent to one of the plurality of processing elements; 

element 1 of the cell D calculates the sum of the local data 65 and 

d and data c stored in the dedicated communication register flag update means exclusively updating a flag indicated 

16 and sends the sum to the dedicated communication by the address held in said flag address holding 
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means when transmission of data indicated by the flag update means for exclusively updating a flag 

direct remote write message is completed indepen- indicated by an address held in said flag address 

dently from execution and processing of said pro- holding means when data of the direct remote read 

cessor means. message has ben sent independently from execution 

2. The parallel computer as set forth in claim 1, wherein 5 and processing of said processor means. 

said flag update means updates said flag indicated by the 8. The parallel computer as set forth in claim 7, wherein 

address held in said flag address holding means a predeter- said flag update means updates said flag a predetermined 

mined number of times to represent a completion of the number of times to represent data of said message has been 

transmission. sent. 

3. The parallel computer as set forth in claim 1, 1Q 9 A parallel computer including a plurality of processing 
wherein said flag address holding means further holds an elements, each of the processing elements comprising: 

address of an acknowledgement flag that represents that a dedica ted communication register having a plurality of 

the direct remote write message has been received; and registers and a plurality of flags corresponding to each 

wherein said flag update means updates the acknowledge- of thc plurality of registers, said registers storing data 

ment flag indicated by the address thereof held in said ^ mdicated by a direct remote write message received 

flag address holding means when the direct remote ^ Qnc of ^ luralit of proccss ing elements, the 

write message is sent and updates the actoowledge- m ^ of data e ^ of 

ment flag indicated by the address thereof held in said corresponduig registers, 

flag address holding means in an inverse mode when a t . *V ... 

receive complete message is received from one of the herein data stored in said dedicated communication 

plurality of processing elements in response to the 20 register is used to reference data of one of the plurality 

direct remote write message. of processing elements. 

4. A parallel computer including a plurality of processing 10. The parallel computer as set forth in claim 9, wherein 
elements, each of processing elements comprising: the processing element executes a task that is in ready state 

processor means for executing instructions and process- when a flag of said dedicated communication register rep- 

ing data; and 25 resents that required data is not stored, 

communication control means constructed of hardware, U. The parallel computer as set forth in claim 9, wherein 

comprising: a processing element performs an operation for both data 

flag address holding means for temporarily holding an stored in said dedicated communication register and local 

address of a receive complete flag of a direct remote data and sends calculated resultant data to said dedicated 

write message when the direct remote write message 30 communication register of one of the plurality of processing 

is received from one of the plurality of processing elements with the direct remote write message so as to 

elements; and execute a global operation, 

flag update means for exclusively updating a flag 12. The parallel computer as set forth in claim 11, wherein 

indicated by the address held in said flag address ^ processing element which participates in the global 

holding means when reception of data indicated by 35 0 p era ti 0 n process further sends ID information of a process- 

the direct remote write message has been completed . elcment that holds particular calculated result data, 

independently from execution and processing of said 13 The paraUe] as ^ forth in daim U> wherein 

processor means. . . . . . the processing element which participates in the global 

5 The parallel computer as set forth in claim 4 wherein ^ ^ determines a processing element to which 

said flag update means updates said flag a predetermined 4o £ calculated resultaDt data ^ sent accordmg t0 a crossover 

number of times to represent a completion or the reception. me tbod 

6. A parallel computer including a plurality of processing m u ^ Uel computer as set fortn in claim 13> where in 
elements, each of processing elements comprising: ^ dedicated communication register of the processing 

processor means for executing instructions and process- element which participates in the global operation process 

ing data; and 45 has i Q ^ n registers where the number of processing elements 

communication control means constructed of hardware, which participate in the global operation process is n. 

comprising: 15. The parallel computer as set forth in claim 9, wherein 

flag update means for exclusively updating a flag a processing element sends broadcast data to said dedicated 

indicated by an address of an acquisition complete communication register of one of the plurality of processing 

flag of a direct remote read message when reply data 50 elements using a direct remote write message so as to 

indicated by the direct remote read message has been execute a broadcast process. 

received from one of the plurality of processing X6. The parallel computer as set forth in claim 9, wherein 

elements independently from execution and process- a p roce ssing element performs an operation for both data 

ing of said processor means, stored in said dedicated communication register and local 

wherein the flag is updated a predetermined number of 55 da ta and sends calculated resultant data to said dedicated 

times to represent that the reply data has been received. communication register of one of the plurality of processing 

7. A parallel computer including a plurality of processing elements using a direct remote write message so as to 
elements, each of processing elements comprising: execute a barrier synchronizing process. 

processing means for executing instructions and process- 17. The parallel computer as set forth in claim 9, wherein 

ing data; and 60 a processing element performs a predetermined operation 

communication control means constructed of hardware, for data stored in said dedicated communication register and 

comprising: sends calculated resultant data to said dedicated communi- 

flag address holding means for temporarily holding an cation register of one of the plurality of processing elements 

address of a reply complete flag of a direct remote using a direct remote write message so as to execute a 

read message when the direct remote read message is 65 recognizing process of status of barrier synchronization, 

read from one of the plurality of processing ele- 18. Aparallel computer including a plurality of processing 

ments; and elements, comprising: 
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a first processing element having a first dedicated com- 
munication means, constructed of hardware, wherein 
said first dedicated communication means has a plu- 
rality of registers and a plurality of flags corresponding 
to each of the plurality of registers, said registers store 5 
data received from second processing elements, the 
flags manage bit values of data storage states of cor- 
responding registers, and data stored in the first dedi- 



16 

cated communication means is used to reference data of 
the second processing elements; and 
said second processing elements, each having second 
dedicated communication means, constructed of 
hardware, to transmit the flags to said first processing 
element. 

* * * + * 
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